ci(release): gate python wheels on e2e for tagged releases#319
Merged
Conversation
- Add e2e to publish-python needs in release-tag.yml so wheels are not published to Artifactory until e2e passes - Remove e2e gate from tag-ghcr-dev in release-dev.yml since dev Docker images do not need to wait for e2e - Replace gitlab-master.nvidia.com references with generic example host in policy-advisor CTF example
…mand test Switch the release canary from Docker-outside-of-Docker (host socket mount) to true Docker-in-Docker. The CI container now starts its own dockerd, so the gateway cluster container is a child process and 127.0.0.1 port bindings are reachable directly. This enables testing the real zero-to-sandbox user path: a single `openshell sandbox create` that auto-bootstraps the gateway, pulls the cluster image, and creates a sandbox — no --gateway-host workaround. Dockerfile.ci changes: - Add iptables (required by dockerd for container networking) - Extract full Docker daemon suite (dockerd, containerd, runc) instead of CLI only release-canary.yml changes: - Remove /var/run/docker.sock volume mount - Add dockerd startup step - Remove gateway host resolution and explicit gateway start steps - Simplify canary to single auto-bootstrap sandbox create command
The first canary run revealed two issues: 1. dockerd failed to start because docker-proxy was not extracted from the Docker static binary tarball. Add it to the extraction list. 2. The GitHub Actions runner injects its own Docker socket into job containers. Without an explicit DOCKER_HOST, the openshell CLI connected to the runner's host Docker daemon instead of our DinD daemon. Start dockerd on a dedicated socket (/var/run/dind.sock) and export DOCKER_HOST so all subsequent steps use it.
Using a custom socket path and DOCKER_HOST breaks the GitHub Actions runner's internal Docker operations (it uses docker exec to run steps inside the container). Since we removed the host socket volume mount, /var/run/docker.sock is free inside the container — just start dockerd on the default path with no DOCKER_HOST override needed.
The GHA runner injects its own /var/run/docker.sock into the container for management, so dockerd can't bind to the default path. Use a dedicated socket (/var/run/dind.sock) and set DOCKER_HOST only on steps that need it (via step-level env) to avoid breaking the runner.
Each GHA step runs via docker exec which sends SIGHUP to backgrounded processes when the shell exits. Use nohup to detach dockerd from the step's process group so it persists across steps.
setsid creates a new session and process group, ensuring dockerd survives when the GHA runner's docker-exec shell exits between steps.
Background processes started via docker-exec don't persist across GHA steps — each step gets a fresh docker-exec invocation. Move dockerd startup into the canary test step itself so it shares the same shell session and stays alive for the duration of the test.
The GHA container uses overlayfs, and the inner dockerd also defaults to overlayfs. Overlay can't be stacked, causing container creation to fail. Use --storage-driver=vfs which copies layers instead of layering them — slower but reliable for DinD.
Add OPENSHELL_GATEWAY_HOST environment variable support to the sandbox create auto-bootstrap path. This mirrors the --gateway-host flag on `gateway start` but works for the implicit bootstrap triggered by `sandbox create` when no gateway exists. In CI containers using Docker-outside-of-Docker (host socket mount), 127.0.0.1 inside the CI container doesn't reach sibling gateway containers. Setting OPENSHELL_GATEWAY_HOST=host.docker.internal fixes this without requiring the two-step gateway-start-then-sandbox-create workflow. Update release canary to use the single-command path: just `openshell sandbox create` which auto-bootstraps everything. For workflow_dispatch (branch testing), builds CLI from source to test the current branch code. For workflow_run (release testing), installs the published binary.
Use the explicit --gateway-host flag on gateway start (works with current published CLI) while also setting OPENSHELL_GATEWAY_HOST env var (will be picked up once the next release ships with env var support). Once the env var support is released, the canary can switch to the single-command sandbox create path.
The canary uses DooD (host socket mount), not DinD, so the dockerd, containerd, runc, docker-proxy, and iptables additions are unnecessary.
The gateway host override is useful in any environment where the client can't reach the Docker host at 127.0.0.1 — CI containers, WSL, remote Docker hosts, etc. Update the CLI help text, DeployOptions doc comment, and bootstrap env var comment to reflect this.
drew
added a commit
that referenced
this pull request
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ensure all tagged release artifacts are gated by e2e tests, and remove unnecessary e2e gating from dev release Docker images.
Changes
e2etopublish-pythonneedsso Python wheels are not published to Artifactory until e2e passese2efromtag-ghcr-devneedssince dev Docker images don't need to wait for e2egitlab-master.nvidia.comreferences withinternal.corp.example.comUpdated gate table
Testing
mise run pre-commitpasses (YAML-only changes, no code)Checklist